Presentation: Tweet"What’s Ahead for Big Data?"

Time: Tuesday 11:00 - 11:50 / Location: Walton South

Apache Hadoop, based on MapReduce, and NoSQL databases are the current darlings of the Big Data world. The MapReduce computing model decomposes large data-analysis jobs into smaller tasks that are distributed around a cluster. MapReduce itself was pioneered at Google for indexing the Web and other computations over massive data sets. For cost-effective storage at scale, NoSQL databases offer various storage models and availability vs. consistency trade-offs.
The strengths of MapReduce are cost-effective scalability and relative maturity. Its weaknesses are its batch orientation, making it unsuitable for real-time event processing, and the inflexibility of the MapReduce computing model.
Storm is emerging as a popular complement to Hadoop for general event processing. NoSQL databases can meet some event-processing requirements, too.
The challenges of programming with MapReduce are best addressed with higher-level APIs and Functional Programming languages that provide common query and data-manipulation abstractions, making it easier to implement MapReduce programs. However, longer term, we need new distributed computing models that are more flexible for different problems and which provide better real-time performance. Besides Storm, Spark is a new, general-purposes system for distributed computing. Graph systems, such as Google’s Pregel, address problems that are best handled with graph algorithms.
Similarly, the NoSQL world is changing. Even as the established players mature, newer entrants to the market reflect the lessons learned from the pioneers, including a renewed interest in the Relational Model!
I’ll examine the current Big Data landscape, discuss where it’s going in the near term, and speculate about the future. I’ll make the case that Big Data is essentially Applied Mathematics, a “killer app” for Functional Programming. To succeed, Big Data systems must embrace this essential truth.

Download slides

Dean Wampler, TweetBig Dataist, O'Reilly Author

Biography: Dean Wampler

Dean Wampler is a Principal Consultant for Think Big Analytics, specialists in “Big Data” application development, primarily using Hadoop-related technologies. Dean is a contributer to several open-source projects and the founder of the Chicago-Area Scala Enthusiasts. He is the author of Functional Programming for Java Developers, the co-author of Programming Scala, and the co-author of Programming Hive, all from O’Reilly. He pontificates on twitter and at polyglotprogramming.com.

Twitter: @deanwampler